-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: alluxio-py support alluxio and oss filesystem #76
base: main
Are you sure you want to change the base?
Conversation
The implementation of the delegated filesystem for Alluxio and OSS has been completed. Specific notes: 1.Users need to specify in the configuration whether the delegated filesystem should be accelerated by Alluxio using the alluxio_enable flag. If set to true, the configuration file must still include the necessary initialization settings for the Alluxio filesystem. 2.The configuration file can include multiple OSS filesystems as delegated filesystems, but it is necessary to ensure that their bucket_name is unique. A unique delegated filesystem is determined by the combination of the delegated filesystem name and the bucket_name.
local_close = os.close | ||
local_mkdir = os.mkdir | ||
local_remove = os.remove | ||
local_rmdir = os.rmdir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it not better to separate interface and implementation?
Constants.S3_FILESYSTEM_TYPE: self._validate_s3_config | ||
} | ||
|
||
def _load_config(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary to add a hot update interface. For example, if the user does not have a configuration file and wants to configure config through the API
alluxio/posix/config.py
Outdated
return self.config_data.keys() | ||
|
||
@staticmethod | ||
def _validate_oss_config(config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File separation for each ufs check? A lot of ufs will definitely be expanded in the future.
alluxio/posix/config.py
Outdated
|
||
|
||
@staticmethod | ||
def _validate_s3_config(config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The s3 interface also needs to be implemented
alluxio/posix/delegate.py
Outdated
@@ -0,0 +1,14 @@ | |||
import os | |||
from alluxio.posix import fileimpl | |||
config_manager = fileimpl.ConfigManager("../../config/ufs_config.yaml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config path cannot be hard-coded in the code. It must have the feasibility of dynamic configuration, such as environment variables and API injection.
alluxio/posix/demo.py
Outdated
|
||
|
||
def delegatefs_open_write(): | ||
write_file_path = f'oss://alhz-ossp-alluxio-test/alluxio-py/delegatefs-io-1.txt' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move this file to tests.
…mits.Date:2024-09-20 The implementation of the delegated filesystem for Alluxio and OSS has been completed. Specific notes: 1.Users need to specify in the configuration whether the delegated filesystem should be accelerated by Alluxio using the alluxio_enable flag. If set to true, the configuration file must still include the necessary initialization settings for the Alluxio filesystem. 2.The configuration file can include multiple OSS filesystems as delegated filesystems, but it is necessary to ensure that their bucket_name is unique. A unique delegated filesystem is determined by the combination of the delegated filesystem name and the bucket_name.
alluxio/posix/config.py
Outdated
def get_config_fs_list(self) -> list: | ||
return self.config_data.keys() | ||
|
||
def update_config(self, fs_type, key, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass in key value through **
alluxio/posix/config.py
Outdated
|
||
def _load_config(self): | ||
if not os.path.exists(self.config_file_path): | ||
raise FileNotFoundError(f"{self.config_file_path} does not exist.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not necessarily depend entirely on the configuration file, and can be configured directly through hot update without a configuration file. If there is no corresponding configuration during use, an exception can be thrown directly.
alluxio/posix/fileimpl.py
Outdated
|
||
|
||
def open(file: str, mode: str = "r", **kw): | ||
logging.info("DelegateFileSystem opening file: %s", file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info change to debug.
fs = instance.get_file_system(file) | ||
if fs: | ||
try: | ||
return fs.open(file, mode, **kw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Through fspec reflection, parameters can be passed directly into ufs client. Is there no need to verify the legality of config parameters?
alluxio/posix/fileimpl.py
Outdated
fs = instance.get_file_system(path) | ||
if fs: | ||
try: | ||
logging.info("DelegateFileSystem getStatus filemeta: %s", path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the log level
alluxio/posix/fileimpl.py
Outdated
return local_rename(src, dest, **kw) | ||
|
||
|
||
class DelegateFileSystem: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate independent files
alluxio/posix/fileimpl.py
Outdated
self.__init__file__system() | ||
DelegateFileSystem.instance = self | ||
|
||
def __create__file__system(self, fs_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it impossible to distinguish uniqueness only through fs name? For example, there are multiple buckets under the same ufs.
…mits.Date:2024-09-29 The implementation of the delegated filesystem for Alluxio and OSS has been completed. Specific notes: 1.Users need to specify in the configuration whether the delegated filesystem should be accelerated by Alluxio using the alluxio_enable flag. If set to true, the configuration file must still include the necessary initialization settings for the Alluxio filesystem. 2.The configuration file can include multiple OSS filesystems as delegated filesystems, but it is necessary to ensure that their bucket_name is unique. A unique delegated filesystem is determined by the combination of the delegated filesystem name and the bucket_name.
The implementation of the delegated filesystem for Alluxio and OSS has been completed. Specific notes:
1.Users need to specify in the configuration whether the delegated filesystem should be accelerated by Alluxio using the alluxio_enable flag. If set to true, the configuration file must still include the necessary initialization settings for the Alluxio filesystem.
2.The configuration file can include multiple OSS filesystems as delegated filesystems, but it is necessary to ensure that their bucket_name is unique. A unique delegated filesystem is determined by the combination of the delegated filesystem name and the bucket_name.